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Abstract. A common approach in epidemiology is to study the transmission 
of a disease in a population where each individual is initially susceptible (S), 
may become infective (I) and then removed or recovered (R) and plays no 
further epidemiological role. Much of the recent work gives explicit consid- 
eration to the network of social interactions or disease-transmitting contacts 
and attendant probability of transmission for each interacting pair. The state 
of such a network is an assignment of the values {S, I, R} to its members. 
Given such a network, an initial state and a particular susceptible individual, 
we would like to compute their probability of becoming infected in the course 
of an epidemic. It turns out that this and related problems are NP-hard. In 
particular, it belongs in a class of problems for which no efficient algorithms 
for their solution are known. Moreover, finding an efficient algorithm for the 
solution of any problem in this class would entail a major breakthrough in 
theoretical computer science. 



1. Introduction 

Mathematical modelling of epidemics is often traced to the celebrated SIR model 
of Kermack and McKendrick [T] . This model posits a population of constant size 
whose members fall into one of three classes: susceptible (S), infective (I) and 
removed (R). Approximating these as continuous and assuming well- mixing, i.e., 
each individual is in equal contact with and equally likely to infect each other 
individual, allows for an approximate description of the infection dynamics using 
ordinary differential equations (ODE). 

Clearly, as it has been argued by many in theoretical [3J [31 H] as well as exper- 
imental studies [5], the well-mixing assumption is not an accurate representation 
of real contact patterns. Thus, much recent work has focused on the role of the 
network of disease-transmitting contacts. (Reviewed in [6]. See also, [ZllH]. For a 
comparison of well-mixed and network-based models, see [9].) Indeed, Kermack's 
and McKendrick's ODE model arises as the limiting case of a simplistic network 
model in which each individual has an equal chance of infecting every other. How- 
ever, real- world social contact networks exhibit complex patterns of interconnection 
between individuals. Further, the probability of transmitting disease from one indi- 
vidual to another depends on the nature, frequency and duration of the contact as 
well as the immune competence of the target individual. This leads to a modelling 
formalism of social networks as a probabilistic graph Q = (G,Pr). Here G is the 
graph G = {V,E), each vertex it S F is an individual, each edge e = {u,v) € E 
records the fact that u might infect v and Pr : _E — > [0, 1] gives the probability that 
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u infects v ii u becomes infective while v is susceptible. In this formalism, Q \s a. 
fixed graph G with labelling Pr. 

This relatively new modelling paradigm has triggered an enormous amount of re- 
search in theoretical epidemiology. The field has greatly benefited from approaches 
that range from applications of bond percolation theory and other techniques from 
statistical physics [IOl[IIl[3l[l2l[l3l[Ill[l5l[l6l[l7l[l8]to large scale simulation en- 
deavours [H [m [20l EU [22] . Given that this mathematical formalism seems accurate 
and powerful to describe the spread of infectious diseases, the natural question arises 
as to whether calculations performed within this formalism can be used in practical 
situations to make useful predictions. Such calculations are based on potentially 
measurable parameters such as network topology and transmission probabilities 
|23j . For instance, one could attempt to calculate the probability that, given a 
social contact network Q, an epidemic starting with a set P of infectives results in 
the infection of an initially susceptible individual u. Are there any computational 
limitations when trying to calculate such magnitudes? If yes, how limiting are they? 
Fortunately, to address the computational issues associated with this and similar 
calculations, we don't need to start from scratch, given that network engineers have 
already studied since the 1970s problems that are essentially the same. 

In the era of electronically digitalized information and digital computers, commu- 
nications networks have become the biggest and count among the most important 
networks. The size of these networks is exponentially increasing. For instance, the 
size of the Internet shows exponential growth since its creation in the early nineties 
(http://www.isc.org/). As the components of such networks are subject to failure, 
engineers face the problem of designing, constructing and operating networks that 
meet the required standards of reliability. Of particular interest is the estimation of 
how reliable a given network is in performing its function, provided some knowledge 
about the reliability of its components is available. In many cases, the function- 
ality of the network can be expressed as the ability of its topology to support the 
network's operation. In other words, the network is functional if and only if certain 
connectivity properties are fulfilled. Consider a network of computers which use 
this network to transmit messages. Let us suppose that each of these computers is 
reliable, but that each communication link has some chance of failure when called 
upon to transmit a message. We then encounter the same formalism explained 
above for social networks. A communications network is given hy Q = (G, Pr) 
where each vertex u G is a computer, each edge e = (u, w) e E' is a communica- 
tion link and Vv : E ^ [0,1] is the reliability of the communication link from u to 
V. One might ask, given a communications network Q, a set of computers P and 
a computer u ^ P, if the computers in P all send a message, what is the chance 
it will reach ul Wc will see that this is the same problem we stated above in the 
context of epidemics on social contact networks. 

It has long been known in the communications network literature that this prob- 
lem is computationally intractable. A standard benchmark of computational com- 
plexity is the class of NP-complete problems. This class has the following proper- 
ties: 



• At present, no algorithm for an NP-complete problem is known to have a 
running time which is bounded by a polynomial. Indeed, many algorithms 
for NP-complete problems have exponential running time. It is unknown 
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whether any NP-complete problem can be solved by an algorithm with 
polynomial running time. 

• If any problem in this class can be solved by an algorithm whose running 
time is bounded by a polynomial, then every problem in this class can be 
solved by an algorithm whose running time is bounded by a polynomial. 

In view of the second, it is considered unlikely that any NP -complete problem 
has a polynomial time solution. The communication among computers problem 
(and hence the epidemiology problem) listed above is known to be as hard as any 
NP-complete problem. Such problems are termed NP-hard. This is not the first 
problem in network epidemiology known to be NP-hard. Previously known exam- 
ples include the following: Given a social contact network and limited resources 

• What is the optimal strategy for vaccinating a limited number of individ- 
uals? 

• What is the optimal strategy for quarantining a limited number of individ- 
uals? 

• What is the optimal strategy for placement of a limited number of sensors 
for monitoring the course of an epidemic? 

(See [211 ESI ESI EI] ■ ) These problems involve the search for an optimum among 
subsets of the vertices or edges of the given social contact network. It might be 
hoped that finding the probability of infection of a single individual would be com- 
putationally less demanding. As the engineers have taught us, this is not so. While 
this result has been recently reported in the physics and operations research com- 
munity |28| . it seems almost unknown among epidemiologists. 

This article is organized as follows: In Section El we give a very brief overview 
of the relevant concepts and methods in computational complexity. This provides 
the unacquainted reader with the basic tools for understanding the main message 
of this paper. Section |3] provides the elementary formal mathematical framework 
for studying SIR epidemics on networks, including the connection with percolation 
theory. In Section |4] we present a series of problems that have been studied in 
network engineering and demonstrate their structural isomorphism with certain 
problems concerning SIR epidemics on networks. Section [5] is devoted to studying 
the computational complexity of extended/generalized epidemiological problems. 
We finish in Section |S] with some concluding remarks. 

2. Computational complexity 

In this section we give a brief account of the class NP-complcte. This class is a 
common benchmark for describing problems which are algorithmically soluble but 
computationally intractable. For those wishing a fuller account we recommend [29j . 

In describing the class NP-complete, it is useful to describe the class P, and 
necessary to describe the class NP. These classes of problems are defined in terms 
of computational complexity. 

The computational complexity of a problem 11 is measured in terms of the run- 
ning time necessary for an algorithm which solves 11. Defining these terms requires 
some preliminaries. First, note that a problem 11 consists of a collection of instances, 
Dyi- Thus, "Determine whether 18 is composite" is an instance of the problem, "For 
any integer n, determine whether n is composite." This is an example of a decision 
problem, that is, for each instance, the answer is either "yes" or "no" . A decision 
problem 11 can be formalized as the pair {Du,Yu), where Yu C Du consists of 
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the yes instances. In this example, Du is the set of integers and Yu is the set of 
composite integers. We will refer to this problem as Ilcomposito- 

Notice that each instance tt S 11 has a size, £{n) and that the computational cost 
of solving the problem grows with the size of the problem. In this example, the size 
£{n) of the instance n is the number of digits in n. If we then have an algorithm 
M which solves 11, we can consider the running time rj\/(7r) required by M when 
applied to the instance tt. This could be measured in elapsed time or in terms of 
the number of steps carried out by M in this computation. We can then define the 
running time of M to be 



if {tt I e{TT) = n} = 
max{rj\/(7r) | £{tt) ~ n} 
otherwise 

The class P consists of those decision problems which can be solved with a poly- 
nomial running time. Stated formally, a decision problem 11 belongs to the class 
P if there is an algorithm M which solves 11 and a polynomial p{n) such that 
fMin) < Pi'n)- An example of a problem in the class P is Ilmuit- An instance of 
Ilmuit is three integers, a, b and c. The size of an instance is the total number of 
digits in a, h and c. These constitute a yes instance if a x 6 = c. 

The class NP consists of non-deterministic polynomial time problems. That is, 
a decision problem is NP if a machine which is allowed to guess can verify a yes 
instance in polynomial time. Ilcompositc provides and example of a problem which 
is NP. Given an instance of Ilcomposito, i-c, an integer c, if c is, in fact, composite, a 
correct guess as to its factors a and 6, can be verified in polynomial time by calling 
Ilmuit- One can define this class in terms of the operation of non-deterministic 
Turing machines. See, for example, [3D]. Clearly P C NP. In view of the perceived 
complexity of many problems in NP, it is generally believed that P ^ NP. 

The class NP-complete consists of the hardest problems in NP. The problems 
in NP-complete have the following property: Suppose that Hi is NP-complete. 
Suppose that 112 is NP. Then there is an algorithm M which translates any instance 
■K2 of 112 into an instance tti of Hi such that tti is a yes instance of Hi if and only 
if 7r2 is a yes instance of 112. Further, both the computational cost of translating 
7r2 into TTi and the size £(tti) are bounded by a polynomial in £{712)- It follows 
that if any NP-complete problem can be solved (deterministically) in polynomial 
time, then every NP problem can be solved in polynomial time. Put another way, 
if any NP-complete problem can be solved in polynomial time, we will then have 
P = NP. 

Hundreds of problems are known to be NP-complete [29j . These come from fields 
such as graph theory, number theory, scheduling, code optimization and many oth- 
ers. They are widely believed to be intrinsically intractable, but this remains an 
open question. Other problems which are not necessarily NP-complete (e.g., be- 
cause they are not decision problems) are known to be at least as hard. This is 
because for such a problem, say F, there is an NP-complete problem 11 that can be 
reduced to F, where the computational cost of this reduction is bounded by a poly- 
nomial in the length of the instance problem considered. Thus, F can be used to 
solve n. These problems are called NP-hard. Since NP-complctc problems trans- 
form to each other, all NP-complete problems can be solved by a reduction to an 
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NP-hard problem. NP-hard problems are found in fields as diverse as epidemiology 
and origami |3f] . 

3. SIR EPIDEMICS ON NETWORKS 

We start by describing a network SIR model in which both the population and 
the individual transmission probabilities arc constant with respect to time. 

A state of this system is the assignment of each individual to one of the classes 
S, I or R. The transmission probabilities determine who can infect whom and con- 
sequently which states can follow a given state. Indeed, they also determine the 
probability that any one of these states follows the given state. An epidemic is a 
sequence of states each of which is a possible successor of the previous state. Con- 
sequently, given an initial state, we can speak of the probability that an epidemic 
evolves through a given sequence of states and the probability that it arrives at a 
particular state. Let us formalize this. 

As above, a social contact network is a pair Q ~ (G, Pr) where G is the graph 
with vertex set V and edge set E. Each edge has the form (w, v) with u,v G V and 
u ^ V. The function Pr assigns a probability to each edge, that is Pr : i? ^ [0, 1]. 
The states of Q are given bjij 

Si{g)^y\^:V ^{SJ^R}}. 

Given states ipi and Lp2 , the state Lp2 is a possible successor of ipi if it satisfies 
the following conditions: 

(1) If </3i(u) = i?, then V'2(u) = R- (Recovered individuals stay recovered.) 

(2) If </3i(u) — /, then f2{u) = R. (Infected individuals recover in one step.) 

(3) If </3i(u) = S, then <p2(u) G (Susceptible individuals either stay 
susceptible or become infected.) 

(4) If (p2{u) = /, then 'Pi(u) = S and there is a vertex q G V\{u} and an 
edge (g, u) with (pi{q) = I. (Infected individuals were susceptible and were 
infected by a neighbour.) 

The requirement that individuals recover in exactly on time-step might appear 
to be a drastic oversimplification. However, the formalism is rich enough to ac- 
commodate patterns of latency and extended periods of infectivity. This can be 
done by replacing the individual represented by vertex m by a sequence of vertices 
ui,U2, ■ ■ ■ representing u on day 1. u on day 2, etc. See, e.g., }32[ . 

An epidemic $ is a sequence of states ipi, . . . ,tpi; where is a possible suc- 
cessor of (fii for i = 1, . . . , fc — 1. The length of this epidemic is £($) — k. Since 
individuals recover after one step, infection must be transmitted or die out. As a 
consequence, no epidemic can be longer than the longest self-avoiding path in G, for 
otherwise, it must infect some vertex twice. If we assume that each edge transmits 
or fails to transmit independently, then it is not hard to compute the probability 
that a susceptible individual is infected by its infected neighbours. This, in turn, 
allows one to compute the probability that a state ipi is followed by a particular 
successor state ip2- Let us denote this probability by Pr((^2 I 'Pi)- This system en- 
joys the Markov property, that is, the probability of a given state depends only on 



In particular, a state ip can be seen as a subset of the Cartesian product Vx {S,I,R}, and 
therefore, it is meaningful to speak of the probability of a state or of a collection of states. 
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the previous state. Thus given an initial state ifi, the probabihty of the epidemic 
<I> = (y9i, ...,(/?„, is 

n 

Pr((f>|(pi) = []Pr((^, 

1=2 

The probabihty that u becomes infected at the n*^ step in the course of an epidemic 
starting with tpi is 

Pr((y9„(u) = / I = ^ Pr((/?l,...,(/3„ I (^l). 

Abusing notation, we denote the probabihty that u becomes infected in the course 
of some epidemic starting with ipi by 

n 

Pr{u I ^i) ^^Pr{ipj{u) = / 1 ipi). 

Note that since an infected individual becomes recovered at the next stage, no 
epidemic appearing in this sum is an initial sub-epidemic of another. Accordingly, 
these are disjoint cases. 

We will be interested in initial states tpi consisting only of infectives and suscep- 
tibles. In this case, we can identify ipi with the set of infectives P = tpi^{I). This 
gives the notation Pr(u | P). 

Let us formalize the problem Ilcpidomic of finding Pr(it | P). An instance tt of 
this problem consists of 

• A graph G = {V,E). 

• A labelhn^ Pr : E ^ [0, 1] n Q. 

• An initial infective set P C V. 

• An individual u G V \ P. 

A solution to tt is the value Pr(M | P). 
Wetake£(n) = \V\. 

The epidemiological viewpoint we have just described follows the evolution of 
probabilities over time. If we ignore the order of events, we come to the simpler 
viewpoint of percolation. Percolation methods have been used in epidemiology. 
(See, for example, [MJ [351 1311 131 133 1311 ■ The latter two contain extensive refer- 
ences.) Since an individual is only infected for one time step in the course of any 
epidemic, an edge can transmit at most once in the course of an epidemic. This al- 
lows us to consider a random variable that takes as values subgraphs of G. Given Q, 
we take G to be the random variable which takes values in {G' = (V, £") | E' C E}. 
The probability that G takes the value G" is given by 

Pr(G')=( n [11(1- Pr(e))). 

We may think of E' as determining whether e = (u, v) transmits in the course of 
an epidemic if that epidemic has a state (/? with = / and iy9(g) = S. Given a 

"^There are technical issues here concerning the values of these probabihties. To avoid these 
issues they are usually assumed to be rational numbers and bounds are placed on the sizes of their 
denominators. For details, see 1331 . Since Q is dense in R, this is not a limitation on the possible 
probability values relevant in real applications. 
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path T in G, we will abuse notation by writing t C G and e e r for the edges of r. 
Given a path r, the probability that it appears in G' = (V, E') is 

(1) Pr({G'|rcG'}) = nP^(e)- 

For a proof of the following theorem, see, e.g., j32j . 

Theorem 1. Suppose Q is social contact network. Then 

Pr{u I P) = Pr{{G' I G' contains a path from P to u}) . 

In particular, Pr{u \ P) is a finite sum of terms of the form iQp. Accordingly, it is 
a polynomial in the values Pr{e) with integer coefficients and degree \E\ . 

This theorem provides the link between epidemiology and communications net- 
works. 

4. NP-HARD PROBLEMS ON COMMUNICATIONS NETWORKS: CONSEQUENCES FOR 

EPIDEMIOLOGICAL CALCULATIONS 

We assume that a communications network consists of a set of computers, each 
of which is reliable and a set of communication links each of which has a known 
likelihood of failure and that the communication links function or fail independently. 
There is no loss of generality in regarding each node as infallible, since a fallible 
computer can be modelled as a pair of nodes with a fallible link connecting its 
input to its output. Once again, wc can formalize this as Q = (G, Pr), where 
G{V, E) represents installed capacity {V being the set of computers and E the 
set of communication links), Pr : — ;> [0,1] the reliability of each link and G 
is the random variable assuming values in {G' = {V,E') \ E' C E}. Each G' = 
(y, E') is the subnetwork of functioning links left after the failure of the edges 
e e E\E'. Successful transmission of a message on this network depends on the 
connectivity of the subgraph realized by G. Network engineers focus on several 
kinds of connectivity. We first examine two of the simplest. 

The two-terminal reliability problem is defined as the calculation of the probabil- 
ity that there is at least one correctly functioning path in the network connecting a 
predefined source node to a predefined target node. An instance tt of 11 two terminal 
consists of the following: 

• A graph G = {V,E). 

• A labelhng Pr : ^ [0, 1] n Q. 

• A source terminal u d V . 

• A target terminal v £ V \ {u}. 

A solution to tt is the value Pr(w | u). 

By Theorem [TJ this value is an integer polynomial in the values Pr(£'). Thus, if 
we restrict to the case where Pr{E) takes a single value, this becomes an integer 
polynomial in one variable called the reliability polynomial. Thus a related problem 
is the following: 

An instance tt of Hid poly is 

• A graph G = iV,E). 

• A source terminal u G V . 

• A target terminal w G \ {u}. 
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A solution to tt is the coefficients of tfie reliability polynomial. A number of 
additional network reliability problems have been studied (see |33j . an excellent 
introduction to this field). These include 

• k terminal reliability. This requires that k chosen terminals arc mutually 
pair wise connected. 

• Broadcasting, also known as all terminal reliability: This requires that all 
terminals are pair wise connected. 

Naturally, in addition to the network reliability problems presented above, many 
other reasonable problems can be defined or could arise from practical applica- 
tions. Formally, once a model G = {G, Pr) of the network has been chosen, a 
general mechanism to define a reliability problem is the following: A network op- 
eration is specified by defining a set Op{G) C {G' — {V,E') \ E' (Z E} of states 
considered to be functional. The set Op{G) is sometimes called a stochastic binary 
system; the elements of Op(G) are termed pathsets. Specifying the pathsets for G 
determines the whole stochastic binary system, and therefore defines the network 
operation. The reliability problem consists of finding the probability Pi{Op{G)) 
that the probabilistic graph G assumes values in the set Op{G). 

A first naive algorithm to solve a network reliability problem formulated in this 
general manner is to enumerate all states of G (i.e., the cardinality of the set {G" = 
{V,E') I E' C E}), determine whether a given state is a pathsct or not using some 
predesigned recognition procedural, and sum the occurrence probabilities of each 
pathset. Due to the statistical independence assumed, the probability of occurrence 
of a pathset is simply the product of the operation probabilities of the edges in 
the pathset and the failure probabilities of the edges not present in the pathset. 
Complete state enumeration requires the generation of all 21-^1 states of G, implying 
that the running time of this algorithm would exponentially depend on the number 
of links in the network. 

A substantial amount of effort has been put into finding more efficient algorithms 
for exact calculation of network reliability problems (see [33]). However, efficient 
exact solutions seem unlikely: 

Theorem 2. The problems Tltwo terminal md Tlrei poly o-i^e NP-hard. 

For a proof of this result, see, for instance. Theorem 1 in [38]. These prob- 
lems belong to the class #P-complctc [39l HO] HU |42l HU [33] . #P is the set of 
the counting problems associated with the decision problems in the set NP. Thus, 
while a decision problem might ask whether something exists (e.g., an assignment of 
truth values to a set of variables which satisfies a given formula) , the corresponding 
enumeration problem asks how many of these there are. Solving the enumeration 
problem solves the corresponding decision problem since knowing whether the num- 
ber of these things is positive tells us whether one exists. In particular, the counting 
version of any problem is always at least as hard as the corresponding existence 
problem. In analogy to NP-complctcncss. a problem is #P-complete if and only if 
it is in #P, and every problem in #P can be reduced to it by a polynomial-time 
counting reduction (see [29j for more details). 

Corollary 1. The problem Hepidemic is NP-hard. 

■^Such recognition procedures generally boil down to path-finding or spanning tree methods, 
which are efficient (i.e., of polynomial running time) and well-know procedures in algorithmic 
graph theory and computer science. 
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To see this, notice that every instance of Iltwo terminal is an instance of Ilepidcmic, 
namely, an instance in which P consists of a single vertex. 

More generally, despite dedicated efforts, no algorithm of polynomial running 
time has been found that allows for the exact calculation of the probability Pr(Op(G)) 
of a given set of pathscts Op{G), unless very specific assumptions arc made on the 
topology of the underlying probabilistic network ([33l|44]). We consider it an open 
question as to which (if any) of these more general network reliability problems (de- 
fined through the choice of a suitable stochastic binary system Op{G)) correspond 
to epidemiological problems. 



Epidemic on netv^rorks with time-varying transmission probabilities. As 

we have seen in the previous section, the seemingly simple problem of finding an 
individual's chances of infection is NP-hard. This is even so in the case where the 
set of initial infectives is a single individual. 

We can generalize Ilepidemic by allowing transmission probabilities to vary over 
time. We have seen that the length of any epidemic is at most the length of the 
longest self-avoiding path in G. Consequently, time- varying transmission probabil- 
ities can be encoded as 



In this case, percolation methods no longer apply. However, every instance of 
Ilopidomic can be mapped into an instance of this extended problem. Thus, the 
time- varying version of this problem is NP-hard. 

Epidemic on networks with disease latency. One might also generalize IIopidGmi 



to allow patterns of latency and extended periods of infcctivitjO. We will take I 
to be a sequence of distinct states, {Ii,l2, ■ ■ ■ ,In}- We assume that for each stage 
li there is an infectivity fj.i and a probability of recovery pi. We take pN = 1- We 
now consider a social contact network Q and infectivity pattern I. We refer to this 
as an {S,I,R} network. The states of this network are 



We modify the definition of possible successor states so that the allowable transi- 
tions are from S to Ii, from to for i = I, . . . , N — 1 and from to R for 
z = 1, . . . , iV. If ip{u) = Ii, u transitions to state R with probability pi and to state 
/i+i with probability \ — pi. If e = (u, v) G E and ip{u) = Ii, and ip(v) = S, then u 
infects v with probability Pri(e,i) = ^iPr(e). We assume that I is non-trivial in 
the sense that there is i with pi ^ and pj ^ 1. This ensures that an infected in- 
dividual has a positive probability of reaching an infective state. As before, under 
the assumption that transmissions and recoveries happen independently, we can 
develop an expression for Pri(u | P). 

Fix I. An instance of Hj is an instance of 11 epidemic- 

A solution to Hj is the value Prx(u | P) 

Theorem 3. Given an non-trivial infectivity pattern X, Hx is NP-hard. 



Lemma 1. Given Q = (G, Pr) and I, there is Q' = (G, Pr) so that for each P C V 
and u i P, Prx{u \ P) = Pr {u \ P). 



5. NP-HARDNESS OF EXTENDED PROBLEMS IN EPIDEMIOLOGY 



Vv.Ex {l,...,\E\} [0,1]. 




{ip\ip:V ^ {S*} UIU {i?}}. 



'For a more general version of this see |32| . 
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Proof. Consider an edge e = {u,v). Suppose that (pi{u) = It and ipi{v) = S. 
What are the chances that v remains uninfected by u7 (We assume for the moment 
that V is not infected by some other neighbour during the next N steps.) We take 
fj, = Pr(e). Let us denote by Ui the probabihty that u remains infected for i steps, 
but not i + 1 steps. We then have 

i 

= Pi n^^ ^ /'j)- 

The probabihty that v remains uninfected by u is 

N i 

i=l J=l 

We now define Q' = (G, Pr') by taking 

Pr'(e) = l-ri(Pr(e)). 
This does what is required. □ 

Proof of Theorem\^ We will show that IIi-ci poly is polynomially reducible to Tlx. 

Fix I to be a non-trivial pattern of infectivity. Suppose we are given an instance 
TT of Xlrci poly This consists of a graph G and source and target vertices u and i;. 
Suppose also that we have a polynomial time algorithm for solving Tlx- We choose 
N + 1 arbitrary probabilities po, . . . ,pn+i- These give us iV + 1 instances of Tlx 
by taking Qi ~ (G, Pr^). where Pr; takes the constant value pi. By the previous 
lemma, solving these + 1 instances of Hx solves N + 1 distinct instances of 
Ilopidomic which consist of the graph G and differing constant functions Pr^ . These 

+ 1 values give us iV + 1 independent linear equations whose unknowns are the 
coefficients of the reliability polynomial. Solving for these is a polynomial time 
problem. □ 

Expected number of total infections. One might hope that while computing 
an individual's probability of infection is NP-hard, there might be a way to compute 
the expected number of infections. This, too, is NP-hard. Let us formalize this. 
An instance tt of Hoxpected is 

• A graph G = iV,E). 

• A labcUing Pr : £; ^ [0, 1] n Q. 

• An initial infective set P C V. 

A solution to tt is the expected number of infections. 

The following theorem was proved in [28|. For the sake of completeness, we provide 
a proof here. 

Theorem 4. Hexpected is NP-hard. 

Proof. We will show that Ilopidomic can be polynomially reduced to Iloxpoctod- Sup- 
pose we are given an instance tt of Ilepidemic- Let tt be the instance of Ilepidomic 
which is formed from tt by appending a single edge from u to v ^ V and assigning 
Pr(u,w) = 1. It is clear that the expected number of infections in tt differs from 
the number of expected infections in tt by exactly Pr(u | P). Thus, if we had a 
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polynomial time algorithm for finding the expected number of infections, we could 
find the probability of any individual becoming infected. □ 

The fact that Hioi poly is NP-hard suggests that the difficulty lies not in the 
probabilities Pr but in the topology of G. One problem which we have not addressed 
here is the question of calculating the probability of infection in an {5, /, R] network 
where G = Gt changes over time due to stochastic births and deaths. It seems likely 
that this will also provide a source of NP -hard problems. However, this requires 
a reformulation of the underlying problem. 

6. Discussion and conclusions 

It has been the purpose of this paper to draw the attention of network epidemiol- 
ogists to results in communications network reliability which shed light on questions 
regarding the computational aspects of epidemiology of {S, I, R} networks. 

Theorem [1] and Theorem |4] tell us that generally, in the absence of a major 
break-through in computer science we cannot expect to be able to compute exact 
probabilities of infection or expected number of infection in large social contact 
networks. As |29j points out, problems do not go away simply because we have 
deemed them NP-hard. 

Since the network engineers have been here before us, it is tempting to ask 
whether their solutions will work for epidemiologists. While we consider the case 
open, the prospects seem mixed. Network engineers are often in the position of 
being able to choose the class of networks under consideration. As opposed to scale- 
free [HI US] and small- world network structures [321 H?! [5] , which frequently arise 
from a self-organization process during the spontaneous growth of a network, engi- 
neered or purposefully designed networks show rather different structures. Some of 
the classes that allow efficient calculations (exact or approximate) include trees, full 
graphs, series-parallel graphs [S^, and channel graphs [44]. Unfortunately, these 
classes of networks seem unrealistic as models of social contact networks. 

Network engineers have turned to Monte Carlo simulation for the calculation of 
estimates of network reliability. We would like to give pointers into their literature 
[48l|49l[50l[5Tl[B2, 53, 54, 28^. This approach has received increased attention in the 
last decade due to the power of modern computers and computing clusters. While 
Monte Carlo simulation only calculates an unbiased point estimator for reliability 
probabilities, increasing the number of simulated samples causes these estimates to 
converge to the actual value. 

The fact that efficient and precise algorithms for computing infection proba- 
bilities are out of reach (see Theorems [T] [3] and |4]) has real-world consequences. 
Designing a response to an emerging epidemic can depend on determining the kind 
of epidemiological probabilities we have been discussing |S1]. The effectiveness of 
interventions during an emerging epidemic often crucially depends on timely im- 
plementation. Our results and those of [24] and [25] place an emphasis on the 
search for efficient and quick methods that give good approximations when applied 
to real- world social networks. 
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